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Abstract 

In this paper, we review recent results of ours concerning branching processes with 
general lifetimes and neutral mutations, under the infinitely many alleles model, where 
mutations can occur either at birth of particles or at a constant rate during their lives. 

In both models, we study the allelic partition of the population at time t. We give 
closed-form formulae for the expected frequency spectrum at t and prove pathwise conver- 
gence to an explicit limit, as t — > +00, of the relative numbers of types younger than some 
given age and carried by a given number of particles (small families). We also provide 
convergences in distribution of the sizes or ages of the largest families and of the oldest 
families. 

In the case of exponential lifetimes, population dynamics are given by linear birth and 
death processes, and we can most of the time provide general formulations of our results 
unifying both models. 
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allele model, frequency spectrum. 
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1 Introduction 

We consider a general branching model, where particles have i.i.d. (not necessarily exponen- 
tial) life lengths and give birth at constant rate b during their lives to independent copies of 
themselves. The genealogical tree thus produced is called splitting tree [12, 13, 22]. The pro- 
cess that counts the number of alive particles through time is a Crump- Mode- J agers process 
(or general branching process) [18] which is binary (births occur singly) and homogeneous 
(constant birth rate). 

We enrich this genealogical model with mutations. In Model I, each child is a clone of her 
mother with probability 1 — p and a mutant with probability p. In Model II, independently 
of other particles, each particle undergoes mutations during her life at constant rate 9 (and 
births are always clonal). For both models, we are working under the infinitely many alleles 
model, that is, a mutation yields a type, also called allele, which was never encountered be- 
fore. Moreover, mutations are supposed to be neutral, that is, they do not modify the way 
particles die and reproduce. For any type and any time t, we call family the set of all particles 
that share this type at time t. 

Branching processes (and especially birth and death processes) with mutations have many 
applications in biology. In carcinogenesis [28, 17, 32, 9, 8, 7], they can model the evolution 
of cancerous cells. In [21], Kendall modeled carcinogenesis by a birth and death process 
where mutations occur during life according to an inhomogeneous Poisson process. In [7, 9], 
cancerous cells are modeled by a multitype branching process where a cell is of type k if it 
has undergone k mutations and where the more a cell has undergone mutations, the faster it 
grows. The object of study is the time of appearance of the first cell of type k. In [32], the 
authors study the arrival time of the first resistant cell and the number of resistant cells, in a 
model of cancerous cells undergoing a medical treatment and becoming resistant after having 
experienced a certain number of mutations. 

Branching processes with mutations are also used in epidemiology. Epidemics, and espe- 
cially their onset, can be modeled by birth and death processes, where particles are infected 
hosts, births are disease transmissions and deaths are recoveries or actual deaths. In [33], 
Stadler provides a statistical method for the inference of transmission rates and of the re- 
productive value of epidemics in a birth and death model with mutations. In [24], Lambert 
& Trapman enriched the transmission tree with Poissonian marks modeling detection events 
of hospital patients infected by an antibiotic-resistant pathogen. They provided an inference 
method based on the knowledge of times spent by patients at the hospital at the detection of 
the outbreak. 

Let us also mention the existence of models, e.g. [11], of phage reproduction within a 
bacterium by a (possibly time-inhomogeneous) birth and death process with Poissonian mu- 
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tations, where particles model phage in the vegetative phase (DNA strands in the bacterium 
without protein coating) and death is interpreted as phage maturing (reception of protein 
coating) . 

In ecology, the neutral theory of biodiversity [16] gives a prediction of the diversity pat- 
terns, in terms of species abundance distributions, that are generated by individual-based 
models where speciation is caused by mutation or by immigration from mainland. Usually, 
the underlying genealogical models are assumed to keep the population size constant through 
time, as in the Moran or Wright-Fisher models, and so have the same well-known properties 
as models in mathematical population genetics (e.g., Ewens sampling formula), with a dif- 
ferent interpretation. See [15, 23] for cases where this assumption is relaxed in favor of the 
branching property. 

In this paper, we are first interested in the allelic partition of the population and more 
precisely in properties about the frequency spectrum (Ml' a ,i > 1), where M\ ,a is the number 
of distinct types younger than a (i.e., whose original mutation appeared after t — a) carried 
by exactly i particles at time t. This kind of question was first studied by Ewens [10] who 
discovered the well known 'sampling formula' named after him and which describes the law 
of the allelic partition for a Wright-Fisher model with neutral mutations. 

In our models, it is not possible to obtain a counterpart of Ewens sampling formula but 
we obtain different kinds of results concerning the frequency spectrum (M t *' a , i > 1). First, we 
get a closed-form formula for the expected frequency spectrum, even in the non-Markovian 
cases. Second, we get pathwise convergence results as t — > +oo on the survival event, of the 
relative abundances of types. Third, we investigate the order of magnitude of the sizes of the 
largest families at time t and of the ages of oldest types at time t, as t — )■ +oo, and show 
convergence in distribution of these quantities properly rescaled. Several regimes appear, 
depending on whether the clonal process, which is the process counting particles of a same 
type, is subcritical, critical or supercritical. 

We do not know of previous mathematical studies, other than ours, on branching processes 
with Poissonian mutations, but there are several existing mathematical results on branching 
models with mutations at birth that we now briefly review. 

In discrete time, Griffiths and Pakes [14] studied the case of a Bienayme-Galton- Watson 
(BGW) process where at each generation, all particles mutate independently with some proba- 
bility u. The authors obtained properties about the number of alleles/types in the population, 
about the time of last mutation in the (sub)critical case and about the expected frequency 
spectrum. In [3, 4], Bertoin considers an infinite alleles model with neutral mutations in 
a subcritical or critical BGW-process where particles independently give birth to a random 
number of clonal and mutant children according to the same joint distribution. In [3], the 
tree of alleles is studied, where all particles of a common type are gathered in clusters and 
the law of the allelic partition of the total population is given by describing the joint law of 
the sizes of the clusters and of the numbers of their mutant children. In [4], Bertoin obtains 
the joint convergence of the sizes of allelic families in the limit of large initial population size 
and small mutation rate. 

In continuous time, Pakes [29] studied Markovian branching processes and gave the coun- 
terpart in the time-continuous setting, of properties found in the previously cited paper [14]. 
In particular, his results about the frequency spectrum and the "limiting frequency spectrum" 
are similar to ours, stated in Section 3. Recently, Maruvka et al. [26, 25] have considered 
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the linear birth and death process with Poissonian mutations. Actually, they rather studied 
a PDE satisfied by a concentration n(x, t) which can be seen as (but is not proved to be) a 
deterministic approximation to the number of families of size x at time t. It is remarkable that 
this PDE has a steady concentration n(x), whose behavior as x — > oo is comparable to the 
asymptotic behavior of the relative numbers of families of size m as m — > +00 in the discrete 
model studied here and in [14] . In the monography [34] , Tai'b is interested in general branching 
processes known as Crump- Mode- Jagers processes (see [18, 19] and references therein) where 
mutations still occur at birth but with a probability that may depend for example on the age 
of the mother. He obtained limit theorems about the frequency spectrum by using random 
characteristics techniques but in most cases, limits cannot be explicitly computed. Some of 
our results in Model I are applications of Tai'b's, but use techniques specific to splitting trees 
to yield explicit formulae. We have refrained to apply results of Tai'b on the convergence in 
distribution of properly rescaled sizes of largest families, on the validity of which we have 
doubts in the case of supercritical clonal processes (see last section). 

The paper is organized as follows. In Section 2, we define the models and give some of 
their properties that will be useful to state the main results. Section 3 is devoted to the study 
of the frequency spectrum (small families). Finally, in Section 4, we give the results about 
ages of the oldest families and about sizes of the largest ones. 

Notice that in this paper, most of the results are stated for linear birth and death pro- 
cesses in order to simplify the notation. Most of them are also true with general life length 
distributions and are proved in Chapter 3 of the PhD thesis [30] for Model I, and in [5, 6] 
for Model II. Specific effort has been put on finding a unifying formulation for our results as 
soon as it seemed possible. 

2 The models 

2.1 Model without mutations 

We first define the model without mutations and give some of its properties. Afterwards, we 
will explain the two mutation mechanisms that we consider in this paper. 
As a population model, we consider splitting trees [12, 13, 22], that is, 

• At time t = 0, the population starts with one progenitor; 

• All particles have i.i.d. reproduction behaviors; 

• Conditional on her birth date a and her life length each particle gives birth at a 
constant rate b G (0,oo) during (a, a + (), to a single particle at each birth event. 

It is important to notice that the common law of life lengths can be as general as possible. 
Let Z = (Z(t),t > 0) be the process counting the number of extant particles through time. 
We denote the lifespan distribution by A(-)/6 where A is a finite positive measure on (0, +00] 
with total mass b and called lifespan measure [22]. 

The total population process Z belongs to a large class of branching processes called 
Crump- Mode- Jagers or CMJ processes. In these processes, also called general branching 
processes [18, 19], one associates with each particle x in the population a non-negative r.v. 
X x (her life length), and a point process £ x called birth point process. One assumes that 
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the sequence (\ x ,£,x)x is i.i.d. but X x and £ x are not necessarily independent. Then, the 
CMJ-process is defined as 



where for any particle x in the population, a x is her birth time. 

In our particular case, the common distribution of lifespans is A(-)/6 and conditional on 
her lifespan, the birth point process of a particle is distributed as a Poisson point process 
during her life. We can say that the CMJ-process Z is homogeneous (constant birth rate) 
and binary (births occur singly). We will say that Z is subcritical, critical or supercritical 
according to whether the mean number of children per particle 

/•oo 

m:= uA(du) (1) 
Jo 

is less than, equal to or greater than 1. 

The advantage of homogeneous, binary CMJ-processes is that they enable explicit compu- 
tations, e.g., about one-dimensional marginals of Z (see forthcoming Proposition 2.1). More 
precisely, for A > 0, define 

ip(X) := A - f (1 - e~ Xu )A(du) (2) 

7(0,00] 

and let r be the greatest root of tp. Notice that ip is convex, ij}(0) = and ^'(O) = 1 — m. As 
a consequence, 

r = if Z is subcritical or critical, . . 

r > if Z is supercritical. 

Let W be the so-called scale function [2, p. 194] associated with tjj, that is, the unique increas- 
ing continuous function (0, oo) — > (0, oo) satisfying 

W(x)e- Xx dx = — i— , A > r. (4) 
V>(A) 

Proposition 2.1 (Lambert [22, 23]). The one- dimensional marginals of Z are given by 

W'(t) 



bW(t) 

and for n > 1 , 

/ 1 \ n— ^ 

F(Z(t) =n) = 



(Z(t) = 0) = 1 

U VV \L) 

W'{t) 



W{t)J bW{t) 2 ' 

In other words, conditional on being non-zero, Z(t) is distributed as a geometric r.v. with 
success probability 1/W(t). 

If Ext := < Z{t) — > > denotes the extinction event of Z, according to [22], as a conse- 
quence of the last proposition, 

P(Ext) = 1-7- 

Thus, thanks to (3), extinction occurs a.s. when Z is (sub)critical and P(Ext c ) > when it 
is supercritical. 

The following proposition justifies the fact that r is called the Malthusian parameter of 
the population in the supercritical case. 
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Proposition 2.2 (Lambert [22]). If m > 1, conditional on the survival event Ext c , 

e~ ri Z(t) — ► 5 a.s. (5) 
where £ is exponential with parameter tp'(r). 

In fact, convergence in distribution is proved in [22] and a.s. convergence holds according 
to [27] (see [31, p.285]). 

2.2 Two mutation models I and II 

We now assume that particles in the population carry types, also called alleles. We consider 
two population models where mutations appear in different ways. In each case, we will make 
the assumption of infinitely many alleles, that is, to every mutation event is associated a 
different type, so that every type appears only once. We will also assume that mutations are 
neutral, that is, they do not change the way particles die and reproduce. 

In Model I, mutations occur at birth. More precisely, there is some p £ (0, 1) such that at 
each birth event, independently of all other particles, the newborn is a clone of her mother 
with probability p and a mutant with probability 1 — p. An illustration is given in Figure I. 

In Model II, particles independently experience mutations during their lives at constant 
rate 9 > 0. In particular, in contrast with Model I, particles can change type several times 
during their lifetime, but always bear at birth the same type as their mother at this very 
time. An illustration is given in Figure II. 



a b a odd 



c c 




t — a 



Figure I: An example of a splitting tree in Model I and of the allelic partition of the whole 
extant population at time t. Vertical axis is time and horizontal axis shows filiation (horizontal 
lines have zero length). Full circles represent mutations at birth and thick lines, the clonal 
splitting tree of the ancestor up to time t. The different letters are the alleles of alive particles 
at time t. 
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Figure II: An example of a splitting tree with mutations in Model II and of the allelic partition 
of the whole extant population at time t. Crosses represent mutations and thick lines, the 
clonal splitting tree of the ancestor up to time t. The different letters are the alleles of alive 
particles at time t. 



In what follows, an important role will be played by the clonal process, generically denoted 
Z±, counting, as time passes, the number of particles bearing the same type as the progenitor 
of the population at time 0. It can easily be seen that the genealogy of a clonal population 
is again a splitting tree, so that Z± is also a homogeneous, binary CMJ process. We denote 
by b± its birth rate, by ip* the associated convex function as in (2) and by W± the non- 
negative function with Laplace transform 1/ip*. Furthermore, when the clonal population is 
supercritical, i.e. when ^.(0+) < 0, we denote by r* its Malthusian parameter, which is the 
only nonzero root of ip*. We will sometimes need to have this generic notation depend on the 
model considered: Z p ,ip p , W p , r p for Model I, and Zg,ipg, Wg, rg for Model II. 

Concerning Model I, it can be seen [30] that the clonal splitting tree has the same life 
lengths as the original splitting tree and birth rate b p = b(l — p), so that its lifespan measure 
is (1 — p)A and 

V> P (A) =pA + (l-j#(A) A>0. 

In particular, as in (1), the clonal population is subcritical, critical or supercritical according 
to whether m(l — p) is less than, equal to or greater than 1. It should be noted that there is 
no closed- form formula for W p . 

Concerning Model II, it can be seen [5] that the clonal splitting tree has birth rate bg = b 
and life lengths distributed as mm(X,Y) where X has probality distribution A(-)/6 and Y is 
an independent exponential r.v. with parameter 9. Then we get 

In particular, rg = r — 9 and the clonal population is subcritical, critical or supercritical 
according to whether r is less than, equal to or greater than 9. It can also be proved that W 
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and Wg are differentiable and that their derivatives are related via 

W' e (x) = eT dx W(x) x > 0, 
with the requirement that Wj?(0) = 1. 

2.3 Exponential case 

An interesting case that we will focus on is the exponential (or Markovian) case, when the 
common distribution of life lengths is exponential with parameter d (with the convention that 
lifespans are a.s. infinite if d = 0), that is, A(du) = bde~ du du or A(d-u) = bdoo(du). In that 
case, Z is respectively a linear birth and death process with birth rate b and death rate d or 
a pure birth process (or Yule process) with parameter b. 

In this case, Z and Z+ are Markov processes and the quantities defined in Section 2.1 are 
computable. Indeed, we have 

^ (A)= x + d ' r = 6 " d ' 

m = 1 — ip'(0) = — and ip'(r) = 1 — — . (6) 

It is also possible to compute the function W, defined by (4), while it is generally unknown. 
From [22, p. 393], we have 



be rx -d 



ifb^d 



W ^ = \l + bx if 6 = d ^° 
and in all cases 

W'{x) = be rx x>0. 
The same results hold for W+, by respectively replacing b, d and r by 

b* :=b(l-p), d* := d, r± := r - bp (7-1) 

in Model I and by 

:= b, d±:=d + 9, r* := r - 6 (7-11) 

in Model II. 

We will sometimes state results in the total generality of splitting trees, in which case an 
equation numbered ( -I) (resp. ( -II)) refers to Model I (resp. Model II), as done previously. 
However, we will most of the time focus on the exponential case, in which case we will as 
soon as possible use the unified notation using *'s. We will notify when the results can be 
generalized and will give precise references. 

Remark 2.3. In the exponential case, notice that Models I and II are two (incompatible) 
cases of a more general class of linear birth and death processes with mutations, where particles 
mutate spontaneously at rate 9, die at rate d, give birth at rate b, and at each birth event: with 
probability p2, the mother and the daughter both mutate (and bear either the same new type 
or two different new types); with probability p\, the daughter (only) mutates; with probability 
Po = 1—pi —p2, none of them mutates. Then Model I corresponds to the case when 9 = p2 = 
and Model II to the case when p\ = P2 = 0. The case studied by Pakes in [29] corresponds 
to 9 = 0, po = u 2 , pi = 2u(l — u) and p2 = (1 — u) 2 . It is still an open question to check 
whether, when our results hold for both Models I and II with the unified notation, they hold 
for all linear birth and death processes with mutations. 
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3 Small families 



Recall that a family is a maximal set of particles bearing the same type at the same given time. 
In this section, we are interested in results about small families that is, families whose sizes 
and ages are fixed, in opposition to those of Section 4 which concern asymptotic properties 
of the largest and oldest ones. 

More precisely, we give properties of the allelic partition of the entire population by 
studying the frequency spectrum (M%' ,i > 1) where M\ ,a denotes the number of distinct 
types, whose ages are less than a at time t, carried by exactly i particles at time t. Notice 
that Ml' 1 is simply the number of alleles carried by i particles at time t (regardless of their 
ages) . 

For instance, in Figure I, the frequency spectrum (M^^i > 1) is (3,2, 1,0, ... ) because 
three alleles (B, E, F) are carried by one particle, A and D are carried by two particles and C 
is the only allele carried by three particles. Moreover, if we only consider families with ages 
less than a, (Ml' a , i > 1) equals (3, 1,0,...) because alleles A and C appear in the population 
before time t — a. Similarly, in Figure II, the frequency spectrum in Model II is (4, 3, 0, ... ). 

In the case of branching processes, there is no closed- form formula available for the law of 
the frequency spectrum as it is the case for the Wright-Fisher model thanks to Ewens sampling 
formula [10]. Nevertheless, we obtained for both mutation models an exact computation of the 
expected frequency spectrum and almost sure asymptotic behavior of this frequency spectrum 
as t — > +oo. 



3.1 Expected frequency spectrum 



We first give an exact expression of the expected frequency spectrum at any time t. 

For < a < t and i > 1, we denote by Ml' da the number of types carried by i partic 
time t and with ages in [a — da, a]. The following proposition yields its expected value. 



Proposition 3.1. For < a < t and i > 1, we have 



,,,„„, _ ew'(t) i v e-»< 



In the exponential case, both expressions read as 



In [30], (8-1) is proved in the general case. Its proof uses the branching property and basic 
properties about Poisson processes. The main argument is that conditional on Z(t — a), M\' a 
is the sum of Z{t — a) independent r.v. distributed as the number of mutants that appear in 
the population in a time interval da and with i clonal alive descendants at time a. The proof 
of the general case of (8-II) in [5] is based on coalescent point processes. 

The expected frequency spectrums E[M^' a ] can be obtained by integrating (8-1) and (8-II) 
over ages. Taking into account the contribution of the type of the progenitor, we can prove 
the following result. 



I \ »-i e -( r '- r *) a 
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Corollary 3.2. For a < t and i > 1, 



E[Af t > a ] = / W'{t -X)[l- — — T-^rdx 



1 / 1 V -1 W' p (t) 



^{'-mTwf)^- (10 ' n) 

In the exponential case, 

HMh = (r - r.y [ (l - 5^Jdx + P(Z.(*) = V«) 

The second terms that appear in the r.h.s. correspond to the probabilities that the 
progenitor has i alive clonal descendants at time t. In the exponential case, we left this 
probability as such, since its expression depends on the model. It is also possible to get 
similar equations for the number of families with ages less than a (resp. with size i) by 
summing over % (resp. by taking a = t) in the last expressions. 

Remark 3.3. In the exponential case, when the process Z is critical, that is, when r = b—d = 
0, for a < t, 

E[M^ a ] = l^U ( 1 



which is reminiscent of Fisher log-series of species abundances [23]. Surprisingly, this expres- 
sion is independent of t G (a,oo). 

From Corollary 3.2, we deduce the asymptotic behavior of E[M t 4,a ] in the supercritical 
case. 

Proposition 3.4. We suppose that m > 1. In the general case, 

lim e- rt E[M!' a ] = l-L-J*** (11) 
t^+oo 1 bip(r) 

where, for Model I, 

V f a ( 1 Y' 1 e~ ru W'(u) 



a / j \ i— 1 e -0u 



and, for Model II, 

Jo V W e (u)J W*{u) 
In the exponential case, we get the simpler formula 

Notice that E[M^' a ] grows exponentially with parameter r, as does Z on its survival event. 
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3.2 Convergence results 

In this section and in all following ones, we are interested in long-time behaviors in the two 
models we consider. Then, from now on, we assume that the process Z is supercritical. 

This paragraph deals with improvements of the convergence results (11) regarding the 
expected frequency spectrum. The following results yield the asymptotic behavior as t — > +00 
of the frequency spectrum (M^' a ,i > 1), conditional on the survival event. 

The main technique we use to prove them is CMJ-processes counted with random charac- 
teristics (see [18] and Appendix A in [34]). It enables us to obtain several pathwise convergence 
results regarding some processes embedded in the supercritical splitting tree. 

A characteristic is a random non- negative function on [0, +00). To each particle x in the 
population, is associated a characteristic Xx, which can be viewed as a score or a weight. It 
must satisfy that (X x , Xx)x is an i.i.d. sequence, where we recall that X x is the life length 
of x and Q x its birth process. Then, the process counted with the characteristic x is defined 
as 

Z x (t):=Y,Xx(t-a x )\ a:[ < t} . (13) 

X 

For instance, if x(t) = \t<\ x }i Z x equals Z and if = l{t<\ x Aa}i Z X (J) is the number of 
extant particles at time t with ages less than a. Then, provided technical conditions about 
X are satisfied, the convergences of e~ rt Z x {t) and of Z x (t)/Z(t) as t — > +00 hold a.s. on the 
survival event. In our case, when x is appropriately chosen, we can use this result to obtain 
the following statements. 

Proposition 3.5. Let Mt be the number of extant types at time t. Almost surely, on the 
survival event of Z , 

lim e~ rt M t = J£ 

t— >+oo 

lim e~ rt Mp a = f' a £ (14) 

t— >+oo 



where in Model I, 



while in Model II, 



00 



J.-J2- I e - ru \n(W p (u))du, (15-1) 
1 ~P Jo 



00 p—6x 

J:=0 I ~\jv~( \^x (15-11) 

/O Wg(x) 



and where £ is the r.v. defined by (5). 
In the exponential case, we have 



00 g— (r— r*)« 



J = (r - r*) / — du. 
Jo 

Notice that (14) is consistent with (11) since P(Ext c ) = r/b and E[£] = l/ip'(r). More- 
over, (14) still holds after M l t ,a is replaced by M t 1 '* and J l,a by J l, °°. 
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3.3 Asymptotic behavior of the limiting frequency spectrum 

Thanks to Proposition 3.5, the proportion M\' a ' /Mt of types carried by i particles and with 
ages less than a converges a.s. to J l ' a /J as t — > +00. This limit is called "the limiting 
frequency spectrum" by Pakes in [29]. This paragraph is devoted to the asymptotic behavior, 
as i — > +00, of J 1 := J 4 ' 00 , obtained by taking a = 00 in (12-1) and (12-11). In the exponential 
case, 

poo / -I \ i-l p -(r-r*)tt 

J t = (r-r*) w2, \ du - ( 16 ) 

Jo V WirM/ W£{u) 

3.3.1 Supercritical case 

In this paragraph, we only treat the exponential case. Let us assume that the clonal process 
is supercritical, that is, r* > 0. Define 

r b* r-r* 

u:=—, fi:=—, 7:= — - — . 
r± n 0* 

We have 7 = p/(l — p) in Model I and 7 = 9/b in Model II. Recall that J 1 is the proportion 
of types carried by i particles in the large time asymptotic. 

Proposition 3.6. In the exponential case, we have for both models 

i— >+oo 

Notice that this result is consistent with [25] where Maruvka et al. use an approximation 
of the frequency spectrum by a concentration driven by a PDE, and with [29] where Pakes 
considers Markov branching processes with multiple simultaneous births, binomial mutations 
at birth and no Poissonian mutations. 

Remark 3.7. The following proof of Proposition 3.6 easily extends to any life length distri- 
butions since it is based on Proposition 2.2 which holds in the general case. 

Proof of Proposition 3.6. Since W±{t) > 1 for t > 0, the sequence (J t ) i>1 is positive and 
non-increasing. Then, according to a Tauberian theorem about series, to prove Proposition 
3.6, it is sufficient to prove that Yli>j J 1 is equivalent to i~ v ^Y(y)[i v as j — > +00. 
Recalling (16), we have 



POO 

Vj 4 = (r-r,) / e- rt nZ*(t) > j)dt 

JO 



and from now on, we follow the proof of [29, Thm 3.2.1]. Let s > be such that j 
Then j u = e rs and 

/■OO f'OO 

f / e- rt F(Z*(t) > j)dt = / e- rt F(Z*(t + s) > e r * s )dt 

JO J-s 

/oo 
e - rt P(e- r *( i+s ) Z+(t + s)> e- r * l )dt. 
-s 
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Using Proposition 2.2 and (6), F(e~ r ^ t+ ^ ZJt + s) > e"^ 4 ) — > W(E* > e~ r **) where E* 
is a non-negative r.v. such that 

F(E* = 0) = 1 - £ 

and conditional on > 0}, £7* is an exponential r.v. with parameter ^£(r*) = 1 — ^ = — . 
Moreover, using Markov inequality, for e > 0, 



( e - r * (t+s) Z*(t + s) > e" r **) < e r * {u+£)t E 



using again the a.s. convergence in Theorem 2.2. Then, for s > and t 6 R, we have 

e -r* p(e -r*(H-«) ZA ( t + s) > e -r*t) < e -^^ >0} + Ce r *% <0} 

and thanks to the dominated convergence theorem, 

■v [°° e -rtp( Zic ty > j) dt f e -rtp(^ > e -r*t) dt = I /" e - rt e -i e " r ^dt 

Jo i^+oo 7 K /i 7 R 

The change of variables x = e~ r * t in the last integral leads to 

f°° 1 f°° VdA 

f / e- rt F(Z+(t) > j)dt — > - / aT^-^dz = 4V, 

which terminates the proof. □ 

3.3.2 Critical case 

We want to obtain a similar result to Proposition 3.6 when the clonal population is critical. 
It seems that this is not possible in a general setting due to the non explicit expression of 
the functions W p and Wg. However, in the exponential and critical case, we have the simpler 
expression W+(t) = 1 + b+t and = 0. Then, we have 



oo / 1 \ i— 1 



Proposition 3.8. In the exponential case, we have 

J\ ~ C ( 7 ) i^3/4 e -2^ 
i— >+oo 

where we recall that here 7 = r/b* = r/d+ and we have set C(x) = ^Ke x ^ 2 x^^ for x > 0. 
Proof. By a change of variables, we have set 

r r°° t^ 1 

where U is known as a confluent hypergeometric function (see [1, Ch.13]). Then, using [29, 
Thm 3.3.2] with B = —2, we have the result. □ 
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4 Asymptotic results about large and old families 



We now state results about ages of the oldest families and about sizes of the largest ones. We 
mainly focus on the case when clonal populations are subcritical. Then, in Subsection 4.3, 
we explain which results hold in the critical and supercritical cases. 
We need some notation. For t > 0, 

• for a > 0, let Ot(a) be the number of extant families at time t, with ages greater than 
a (O for "old"); for convenience, we set Ot(a) = if a < 0. 

• for x£l, let Lt(x) be the number of families with sizes greater than x at time t {L for 
"large"). 

In this section, we are interested in finding the orders of magnitudes of the ages and of the 
sizes of the families, that is, in finding numbers c± and xt such that E [0((q)] and E [Lt(xt)\ 
converge to positive and finite real numbers as t — > +oo. 

4.1 Ages of old families in the subcritical case 

In this section, we suppose that the clonal processes are subcritical and we are interested in 
ages of old families. Although we only state the results in the exponential case, they also 
hold in the general case and are proved in [30, Ch. 3] and [6]. However, to obtain the general 
results in Model I, additional assumptions about the lifespan measure A are required, which 
are easily satisfied in the exponential case (for instance, we need the existence of a negative 
root of ip p , which, with easy computations, is 6(1 — p) — d in the exponential case). 

In the first result, which is a result in expectation, we show that in both models, the ages 
are of order of magnitude 

r 

c t := 1. 

r — r* 

Proposition 4.1 ([30, 6]). We suppose that Z± is subcritical. For a £ R, we have 

E[O t (a + C t )} — > M e -(r-r*)a_ 

This result is a consequence of the expected spectrum formula (9), summed over i > 1 
and integrated on (a + ct,t). We also obtain a more precise result about the convergence in 
distribution of Ot(a + q) as t — > +oo. 

Proposition 4.2 ([30, 6]). With the same assumptions as in Proposition J^.l, for a G IR, 
conditional on the survival event, as t — >■ oo, Ot{a + ct) converges in distribution to a r.v. O, 
distributed as a mixed Poisson r.v. whose parameter of mixture is 

r d* 

where E is an exponential r.v. with mean 1. Equivalently, O is geometric on {0, 1, • • • } with 
success probability 

1 

1 + 6Jl*l e -(r-r*)a 
r a* 
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The proof of this proposition in the general case and for Model I, given in [30], follows 
arguments of Tai'b in [34] and uses the notion of CMJ processes counted with time- dependent 
random characteristics developed by Jagers and Nerman in [19, 20]. The difference with 
(13) is that here the characteristics are allowed to depend on time. This theory provides 
convergences in distribution, as t — > +oo, of quantities of the form 

Z ? ■= ^Xx(*-^x)l{ CT:c < 4 }, 

X 

under technical conditions about the family of characteristics {'X t {-),t — 0)- The proof of 
Proposition 4.2 for Model II is given in [6] and does not make use of random characteristics. 

The last result deals with the convergence in distribution of the sequence of ranked ages 
of extant families. Let .A4(K) be the set of non-negative cr-finite measures on K and finite 
on K + , equipped with the left-vague topology induced by the maps v h-» J R f{x)v{dx) for all 
bounded continuous functions / such that there exists xq £ M satisfying \/x < xq, f(x) = 0. 

Theorem 4.3 ([30, 6]). With the same assumptions as previously, let Xt be the point process 
defined by 

X t (dx) : =^<5 Af _ ct (dx) 
fc>i 

where A\ > A\ > ■ ■ ■ is the decreasing sequence of ages of alive families att. Then, conditional 
on the survival event, Xt converges as t — > oo in Ai (]R) equipped with the left-vague topology 
to a mixed Poisson point process with intensity measure 

-^E (r-r^e^-^dx 
r 

where E is an exponential r.v. with mean 1. 

4.2 Sizes of largest families in the subcritical case of Model II 

In this paragraph, we still suppose that the clonal process is subcritical and we are interested 
in similar results as those of Subsection 4.1 about the sizes of the largest families. The aim 
is to find a number xt such that L t {xt) converges to a finite and positive limit as t — > +oo. 

Concerning Model I, this problem is still open. On the contrary, it is possible to obtain in 
Model II the sizes of the largest families. In [6], they are given for any life length distribution 
but to simplify the results, we only state them in the exponential case. The following result 
is a consequence of (10-11) applied with a = t and summed over i > xt + c. Recall that the 
clonal process is assumed to be subcritical, so that 6 > r. 

Proposition 4.4 ([6]). We set 

"log (A) ' 

Then, for c G R, 

E[L l ( It + c)] l ^^( M , ^ -)( ; ^) C " I+, " I " C, 

where {x} denotes the fractional part of a real number x and where A(b, d, 6) is an explicit 
constant that only depends on b, d and 6. 
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For t > and k > 1, we denote by the size of the fc-th largest family in the whole 
population at time t. Let 

k>l 

be the point measure of the renormalized sizes of the population. To get rid of fractional 
parts, the following theorem gives convergence in distribution of Lt(xt + c) and Xt along a 
subsequence. More precisely, for n > 1, let t n be such that xt n = n; this equation has a 
unique solution for any n greater than some integer uq. It satisfies 

e-r. (e + d\ 

t n ~ —7— log —r- n. 

n->+oc (J \ J 

We now state the convergence of the sequence (X tn ,n > no). 

Theorem 4.5 ([6]). Conditional on the survival event, the sequence (Xt n ,n > no) of point 
processes on 7L converges as n — > +oo on the set M(M) equipped with the left-vague topology 
to a mixed Poisson point measure on Z with intensity measure 

where the mixture coefficient E is an exponential r.v. with mean 1. 

4.3 Other results 

4.3.1 Critical case in Model I 

The case of a critical clonal process Z p for a general supercritical splitting tree is treated in 
Section 3.5.1 of [30] where the counterparts of Propositions 4.1 and 4.2 and Theorem 4.3 are 
proved. 

If (1 — p)m = 1, provided that the second moment a 2 := J °° A(du)u 2 is finite and that a 
condition about the tail distribution of A holds, ages of oldest families are of order 

logt 

c t = t . 

r 

Notice that these conditions about A are trivially satisfied in the exponential case. These 
results were also proved in [34, Ch. 4] for any CMJ-process Z, i.e. with a birth point process 
as general as possible, but in that case, limits were not explicit. 

Similarly to the subcritical case, the problem of sizes of the largest families is still open. 
Nevertheless, we can state the following conjecture about their order of magnitude. 



Conjecture 4.6. If 



2 



x t '■= —n(rt - t logt), 

as t —7- oo, on the survival event, Lt(xt) converges in distribution to a non- degenerate geo- 
metric r.v. 
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4.3.2 Critical case in Model II 



The general case when Zq is critical (0 = r) can be found in Sections 3.4 and 5 in [6]. For 
both ages and sizes, the counterparts of the results of Subsections 4.1 and 4.2 hold. 

As in Model I, ages of the oldest families are of order c± = t — Moreover, sizes of the 
largest ones are of order 

r 2 ( log A 2 
Xt 4V/(r) V 2r J 

and the point measure 




k>l 

converges to a mixed Poisson measure as i -> +oo but contrary to Theorem 4.5, it does not 
only hold along a subsequence. 

4.3.3 Sizes of largest families in supercritical cases 

In [30, Ch. 3], general splitting trees in Model I are considered. When the clonal process Zp 
is supercritical, that is, when (1 — p)m > 1, a result about the sizes of the largest families is 
proved. First notice that, as in (5), 

and on {Z p (t) — > 0} c , e~ rpt Z p (t) a.s. converges as t — > oo to an exponential random variable. 
Hence, the sizes of alive families at time t must be of order e Tpt as t — > +oo. We proved this 
in [30] by showing that E [Lj (e rpt )] converges as t — > +oo to an explicit limit. 

Notice that we cannot obtain similar results to Proposition 4.2 and Theorem 4.3 concerning 
the convergence in distribution of L t ( e ( 6 ( 1-p )~^*) anc j t h e convergence of the associated point 
measure of the decreasing sequence of family sizes. 

In [34], Taib considers a more general model than our Model I; mutation mechanism is 
the same but Z p can be any supercritical CMJ-process. In his Theorem 4.6, by using a time- 
dependent characteristic argument, he proved the convergence in distribution of Lt (e Tpt ) (to 
a non-explicit random variable). However, we have doubts about the application of Theorem 
A. 7, since the technical requirements of this theorem do not seem to hold in his case. These 
technical requirements are neither proved to hold in [34] nor in [20]. 

In Model II, for a general supercritical splitting tree, if Zg is supercritical, that is, r > 9, 
Zg(t) asymptotically grows like e^ -61 )*. In [6, Prop. 3.2], it is proved that IE [L t (e^" 6 *^)] 
converges as t — > +oo, but we were unable to obtain any convergence in distribution in that 
case. 
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